This study will investigate energy consumption in the Bay Area in the past 3+ years, beginning in 2017 and ending half-way through 2020. By specifically focusing on electricity and gas usage, both commercial and residential, I hope to infer the changes that may have occurred in the way Bay Area residents consume energy as a result of the COVID-19 pandemic and its onset in March 2020.
First, I pulled the PG&E data from 2017-2020, for both electricity and gas usage, and combined these disparate .csv files into one comprehensive dataframe. I filtered out all usage types that weren’t residential or commercial, and I converted gas and electricity usage into kBTUs so that they might be compared relative to one another.
library(tidyverse)
library(plotly)
types <- c("Electric","Gas")
years <- 2017:2020
quarters <- 1:4
pge_data <- NULL
for (type in types) {
for (year in years) {
for(quarter in quarters) {
# Account for missing Q3 and Q4 in 2020
if((year == 2020) && (quarter %in% 3:4)) {
next # skip
}
# Pull the correct file name & read it into a temporary variable
filename <- paste0("PGE_", year, "_Q", quarter,"_", type,"UsageByZip.csv")
temp <- read_csv(filename)
# Filter for just Commercial and Residential use types
temp <- filter(temp,CUSTOMERCLASS %in% c("Elec- Residential", "Elec- Commercial", "Gas- Residential", "Gas- Commercial"))
# Convert to kBTUs
if(type == "Electric") {
temp$TOTALKWH <- temp$TOTALKWH * 3.412 # 1 kWh = 3.412 kBTUs
} else { # Gas
temp$TOTALTHM <- temp$TOTALTHM * 100 # 1 therm = 100 kBTUs
}
# Rename
names(temp)[7] <- "TOTALKBTU"
# Add the data to the running list & save
pge_data<- rbind(pge_data,temp[,1:7])
saveRDS(pge_data, "pge_data.rds")
}
}
}
Next, I focused in to the area of interest by filtering out all zipcodes but those in the 9 counties that comprise the Bay Area. I also added geographic information about each of these zip codes for use in mapping the results later.
# Filter out only the Bay Area zip codes
library(sf)
library(tigris)
library(leaflet)
# Import county data from tigris & get just the Bay
ca_counties <- counties("CA", cb = T, progress_bar = F)
bay_county_names <-
c("Alameda","Contra Costa","Marin","Napa","San Francisco","San Mateo",
"Santa Clara","Solano","Sonoma")
bay_counties <- ca_counties %>% filter(NAME %in% bay_county_names)
# Get zip code data from tigris & get just the Bay
usa_zips <- zctas(cb = T, progress_bar = F)
bay_zips <-
usa_zips %>%
st_centroid() %>%
.[bay_counties, ] %>%
st_set_geometry(NULL) %>%
left_join(usa_zips %>% select(GEOID10)) %>%
st_as_sf()
# Filter pge data to be just the Bay area zip codes
pge_data_bay <-
pge_data %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
) %>%
group_by(ZIPCODE) %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
To begin to interpret the data, I created bar graphs to compare the electricity and gas usage for commercial and residential usage over the past 3.5 years. Separating the use case by commercial and residential allowed for trends to be seen more clearly.
The figures above show a number of interesting trends. First, there is a clear seasonality to all of the data – for both residential and commercial applications, gas usage peaks in the winter and electricity usage peaks in the summer. This is logical, as gas is typically used to heat homes and electricity to cool them. Also interesting to note is that there is much more total energy usage in residential applications than commercial; in general, we use most of our energy at home.
To understand the effect that COVID-19 had on energy usage patterns, we turn our attention to the end of both graphs, as the last four bars correspond to March, April, May, and June of 2020. Looking first at the residential plot, it’s clear to see that while March looks roughly the same as the previous three years, April is significantly higher. From 2017-2019, April represented a substantial drop in total energy usage to ~1e10 kBTUs; however, for 2020 the April bar is closer to 1.25e10 kBTUs.
The opposite relationship is apparent in the commercial plot. Here, the difference is even more stark – the total amount of commercial energy usage drops dramatically for April, May, and June. This corresponds with our external knowledge that the shelter-in-place ordinance was put in place in mid-March 2020, and at that time very few businesses were allowed to remain fully open.
To further explore the more subtle residential relationship, I want to investigate the zipcodes that experienced the most change from the previous years’ average in residential electricity usage. I first filtered out the months March-June for 2017-2019, and then for 2020. I averaged these values for each zip code, and then found the percent increase from the baseline (2017-2019) values to the post-COVID (2020) values. Finally, I plotted this on an interactive map of the Bay Area.
Note: This analysis assumes that the year-to-year electricity usage is steady for a given month. Though this assumption appeared justified from the cyclic pattern of energy usage we saw in the bar chart above, it may still influence the results shown on the map.
# Explore the change in residential electricity usage pre- and post-pandemic
# Get just the residential electricity pre-covid (march-june) and average it
pge_data_bay_precvd <- pge_data_bay %>%
filter(CUSTOMERCLASS == "Elec- Residential", MONTH %in% 3:6, YEAR %in% 2017:2019) %>%
aggregate(TOTALKBTU ~ ZIPCODE, ., mean)
names(pge_data_bay_precvd)[2] <- "AVG_KBTU_PRE"
# Filter to get the post-covid data frame
pge_data_bay_postcvd <- filter(pge_data_bay, CUSTOMERCLASS == "Elec- Residential",
MONTH %in% 3:6, YEAR == 2020) %>%
aggregate(TOTALKBTU ~ ZIPCODE, ., mean)
names(pge_data_bay_postcvd)[2] <- "AVG_KBTU_POST"
# Find the percent difference pre- and post-covid for each zipcode
pge_data_bay_cvd <- inner_join(pge_data_bay_precvd, pge_data_bay_postcvd, by = c("ZIPCODE"), type = "inner", match = "all")
pge_data_bay_cvd <- mutate(pge_data_bay_cvd, KBTU_CHANGE = 100 * (AVG_KBTU_POST - AVG_KBTU_PRE)/(AVG_KBTU_PRE+1))
# Add the geometry back
pge_data_bay_cvd <- distinct(merge(pge_data_bay_cvd, select(pge_data_bay, ZIPCODE, geometry), by = "ZIPCODE", all=FALSE))
pge_data_bay_cvd2 <- pge_data_bay_cvd %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
) %>%
group_by(ZIPCODE) %>%
summarize(
KBTU_CHANGE = sum(KBTU_CHANGE, na.rm = F)
) %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
# Create the map!
res_pal <- colorNumeric(
palette = "Blues",
domain =
pge_data_bay_cvd2$KBTU_CHANGE
)
leaflet() %>%
addTiles() %>%
addPolygons(
data = pge_data_bay_cvd2,
fillColor = ~res_pal(KBTU_CHANGE),
color = "white",
opacity = 0.5,
fillOpacity = 0.5,
weight = 1,
label = ~paste0(
round(KBTU_CHANGE),
"% change in ",
ZIPCODE
),
highlightOptions = highlightOptions(
weight = 2,
opacity = 1
)
) %>%
addLegend(
data = pge_data_bay_cvd2,
pal = res_pal,
values = ~KBTU_CHANGE,
title = "Percent change<br>in avg. residential<br>electricity usage"
)
First and foremost, the map shows that across the entire Bay Area, the average residential electricity usage increased slightly from the baseline condition. This pattern appears roughly uniform at a ~10% increase for most of the Bay, but the increase is most dramatic in northern Marin County near Point Reyes National Seashore, where one zipcode saw an increase as big as 144%, or more than double its baseline energy usage.
To better understand the rest of the Bay Area, I will filter out the 5 largest outliers, all near Point Reyes and with a percent change larger than 55%.
This new map reinforces the conclusion drawn before: the rest of the Bay Area has a pretty uniform increase in residential energy usage. Though there are some areas with a slightly larger increase, such as downtown Oakland and San Francisco, in general the behavior is uniform across the Bay. I’m left to wonder whether the dramatic increase in usage in the remote northern Marin zipcodes was truly caused by COVID, or rather an increase in development in those areas. However, I can state with confidence that the average electricity usage from March to June rose in 2020 as compared to the three previous years in the vast majority of Bay Area zipcodes.